Encoders, decoders, codecs and systems and processes for their operation and manufacture

ABSTRACT

A block encode circuit ( 800 ) including a scanner ( 820 ) operable to scan a block having data values spaced apart in the block by run-lengths to produce a succession of pairs of values of Level and Run representing each data value and run-length, and wherein the Level values include one or more AC values succeeded by a DC value in the succession, and a Run-Level encoder ( 830 ) responsive to said scanner ( 820 ) to encode the values of Level and Run in a same AC to DC order as in the succession of pairs of values from said scanner ( 820 ) to deliver an encoded output. Other encoders, decoders, codecs and systems and processes for their operation and manufacture are disclosed.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a divisional of prior application Ser. No.12/776,496, filed May 10, 2010, currently pending;

And this application is related to provisional U.S. Patent Application“Fast Residual Encoder In Video Codec” Ser. No. 61/178,726 (TI-66442PS),filed May 15, 2009, for which priority is claimed under 35 U.S.C. 119(e)and all other applicable law, and which is incorporated herein byreference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not applicable.

COPYRIGHT NOTIFICATION

Portions of this patent application contain materials that are subjectto copyright protection. The copyright owner has no objection to thefacsimile reproduction by anyone of the patent document, or the patentdisclosure, as it appears in the United States Patent and TrademarkOffice, but otherwise reserves all copyright rights whatsoever.

BACKGROUND Fields of Technology are Telecommunications, Digital SignalProcessing and Compression and Decompression of Image Data

Structures and processes are provided for encoding and decoding aresidual layer of advanced video codec standards such as AVS. An AVSresidual encoder is a Chinese video codec standard. In conventional AVS,the order of scan and entropy coding are opposite to each other.

A block diagram of a residual encoder for a video codec is depicted inFIG. 1. A unit size of encoding and decoding a residual layer is calleda block as illustrated by block 110 in FIG. 1. Residual coefficients ina Macroblock have zero value (Run) or non-zero value (Level). In somevideo codec standards, residual coefficients are compressed into abitstream by the following steps:

Encoder starts scanning 120 residual coefficients in a block 110 ofquantized coefficients starting from the DC coefficient and scanningtoward AC coefficients. The term “DC coefficient” refers to the singleimage transform coefficient representing the unvarying component or termin an image transform resulting from electronic computation, and oftenthis term represents the average image intensity of the image to whichthe image transform is applied. AC coefficients refer to any or all ofthe image transform coefficients that represent the amplitude orintensity of each spatially varying component or term of the imagetransform of an image.

During scanning, the encoder counts a number of consecutive Runs betweenLevels (called a Scan section in following FIG. 1).

If the encoder finds Level, the Level value and number of consecutiveRuns (Run-length) are converted into a symbol value. In this step, aconversion table based on Huffman coding theory is applied (called anEntropy Encoder 130 in following FIG. 1).

The symbol value is converted into a bitstream and fed to a streambuffer 140.

In FIG. 2, blocks and a Macroblock and their relationship areillustrated, showing blocks inside a Macroblock. A block has 64coefficients (8 for horizontal, and 8 for vertical), and a Macroblockhas 6 blocks (4 blocks are luminance and 2 blocks are chrominance) in a4:2:0 format. Then, 64 coefficients/block×6 block/Macroblock=384coefficients/Macroblock.

Clock cycles for encoding or decoding a Macroblock are distributed amongtasks for encoding or decoding a Macroblock in three categories. Thefirst task category is interface control to prepare information for acurrent Macroblock and a neighbored Macroblock and accomplish datatransactions at system level. It can consume around 50 clocks. Second isa Macroblock header process that generates a motion vector predictor,and processes syntaxes in the Macroblock header such as electroniccomputation of motion vector, Macroblock type, CBP (Coded Block Patternread: on/off bit), and quantization parameters and can consume around300 clocks. Third is a residual layer process that processes 384coefficients in each Macroblock and can be quite time consuming beyondavailable real-time processing budget.

As described hereinabove for FIG. 1, encoding the residual layer startswith scanning coefficients in a block. In order to start entropy codingof a block, all coefficients in the block are scanned to get the Levelvalue and Run-length. A rule or order for scanning coefficients isillustrated in FIG. 3. Thus, FIG. 3 depicts a scanning order of residualcoefficients in a block. The method in FIG. 3 is called a Zig-Zag Scan.The Level value and Run-length are integrated into one word, and encodedinto a bit-stream according to a particular rule or order adopted forthe entropy coding.

In an AVS residual encoder processing flow, the order of scan andentropy coding are opposite to each other according to a method forencoding the residual layer as illustrated in the following FIG. 4.

In conventional AVS encoding, the encoder scans the residualcoefficients in a block from DC to AC. This method is called a Zig-Zagscan. The encoder checks non-zero coefficients and number of consecutivezero coefficients before the non-zero coefficient. Here, the non-zerocoefficient is called a Level, and number of consecutive zerocoefficients is called a Run. When the encoder faces to or encounters aLevel, the Level and Run are stored into a buffer memory as in FIG. 4.The reason why the encoder needs to store the Level and the Run is thatthe entropy coding order (AC-to-DC) is opposite to the DC-to-AC scanningorder, which is recognized herein as problematic and emphasized hereinby oppositely-directed vertical arrows of FIG. 4. Put another way, theorder of scan that constitutes the Run & Level Buffer in FIG. 4 isopposite to the order by which the entropy coder consumes the contentsof that Run & Level Buffer. The entropy encoder starts encoding from thelast entry of the Run & Level buffer as shown in FIG. 4.

As illustrated in FIGS. 4 and 5, the processing flow of an AVS residualencoder has steps wherein

(1) The encoder starts scanning all coefficients inside a block from DCposition to AC (64^(th)) position.(2) The encoder stores Level and Run-length before the Level when theencoder finds a Level during scanning.(3) After the encoder has finished scanning all coefficients in theblock, the encoder starts entropy coding of the block from the lastLevel toward the first Level. In FIG. 4, ID=13 (Level=1, Run=2) isfirstly encoded. Then ID=12, ID=11 . . . and finally ID=0 (Level=18,Run=0) is encoded.

In a first time slot of FIG. 5, the encoder scans block-0 and does notactivate entropy coding. In a second slot, the encoder of FIGS. 3A/3Band FIG. 4 scans block-1 and activates entropy coding of block-0, perstep (3) of the previous paragraph. Consequently, the encoding pipelinelatency is equal to the time period consumed by scanning the previousblock. In an AVS residual encoder, the pipeline latency is equal to 64clocks without any overhead. A functional image of this pipelinedarchitecture is illustrated in FIG. 5 and shows the pipeline latency ofAVS.

In order to accomplish the encode in H/W, a ping-pong buffer 335 isprovided in FIG. 6. The encoder stores Level and Run-length, asillustrated in FIG. 4, in the ping-pong buffer 335 of FIG. 6. A blockdiagram of the encoder with two ping-pong buffers 335.1 for parallelizeddata transfer in and 335.0 for data transfer out is illustrated in FIG.6. Each buffer 335.0 and 335.1 is capable of storing 64 coefficients.Because maximum Run-length is 63 (defined by 6 bits), each buffer has acapacity of at least 1408 bits, wherein 64 coefficients times the sum of16 bits/coefficient (Level) plus 6 bits/coefficient (Run) is 1408 bits.Total ping-pong buffer size for the parallelized operation of FIG. 6 istherefore 2 buffers times 1408 bits/buffer, or a total of 2816 bits.Thus, almost 3 Kbits of memory area is involved to do AVS in the wayjust described.

Existence of buffer memory and pipeline latency are problematic anddisadvantageous from the standpoints of performance, power and area. Abuffer memory of 3 Kbit consumes electric power, and respective controllogic must also be provided for each buffer. As illustrated in FIG. 5,processing time is equal to that of encoding 7 blocks.

Note in FIG. 4 that the scanning order and coding order of residuallayer are opposite to each other. In this conventional approach, theencoder firstly scans coefficients in block-0 and results are storedinto a buffer-A 210 in the Buffer Area 335.1 of FIG. 6. After theencoder finishes scanning (320) into the block-0, the encoder startsscanning (320) into block-1 (335.1) and does entropy coding (330) ofblock-0 (335.0) that was stored in buffer-A. The result of scanningblock-1 is stored in buffer-B in the Buffer Area of FIG. 6. Scan 320sends Encoder core 330 number of level coefficients per each block and aCBP value of the block. In this manner, all blocks from block-0 toblock-5 in a Macroblock are encoded in FIG. 5. Thus, a pair ofspace-consuming buffers not only expend chip real estate but alsointroduce a pipeline latency of 1 block—a pipeline latency of e.g., 70clocks and keeping encode total clock cycles unacceptably high.

Accordingly, some ways of providing improved encoders and decoders,processes and systems would be very desirable in the art.

SUMMARY OF THE INVENTION

Generally, one form of the invention includes a block encode circuitincluding a scanner operable to scan a block having data values spacedapart in the block by run-lengths to produce a succession of pairs ofvalues of Level and Run representing each data value and run-length, andwherein the Level values include one or more AC values succeeded by a DCvalue in the succession, and a Run-Level encoder responsive to thescanner to code the values of Level and Run in a same AC to DC order asin the succession of pairs of values from the scanner to deliver anencoded output.

Generally, one manufacturing process form of the invention includesfabricating on a single integrated circuit chip a block buffer and ascanner operable to scan the block buffer for data values spaced apartin the block by run-lengths to produce a succession of pairs of valuesof Level and Run representing each data value and run-length, andwherein the Level values include one or more AC values succeeded by a DCvalue in the succession, and a Run-Level encoder responsive to thescanner to code the values of Level and Run in a same AC to DC order todeliver an encoded output.

Generally, an image encoder process of the invention includeselectronically scanning image transform-based coefficients in aparticular scanning order and variable-length coding the imagetransform-based coefficients in substantially the same order.

Generally, a reduced-memory encoder of the invention includes an encoderoperable to encode a series of data input thereto, the encode dependenton coding information in a plurality of different coding tables; are-usable store space for the plurality of different coding tables forencoder support, the same store space otherwise re-usable for other usesthan encoder support; a coding memory space able only to holdsubstantially fewer than the plurality of coding tables, the encodercoupled to access only the coding memory space instead of the re-usablestore space for coding table information; and a selection circuitoperable to supply a selection signal to the store to deliver anapplicable coding table to the coding memory space.

Generally, another manufacturing process of the invention includesfabricating on a single integrated circuit chip a code-relatedprocessing circuit dependent on coding information in a plurality ofdifferent coding tables to encode a series of data input thereto, acache memory having a capacity sufficient to hold the different codingtables, a coding memory space substantially smaller than the cachememory, the code-related processing circuit coupled to access only thecoding memory space instead of the cache memory for such codinginformation, and a selection circuit operable to supply a selectionsignal to the cache memory to deliver an applicable coding table to thecoding memory space.

Generally, a decode circuit of the invention includes a Run-Leveldecoder operable to deliver a succession of pairs of values, each pairincluding a Run value and a Level value, and an inverse scannerresponsive to the succession of pairs of values to populate a block withthe Level values including one or more AC values and a DC value, theLevel values spaced apart in the block by runs having lengthsrepresented by the Run values, and the inverse-scanner is operable tosequentially populate the block using the Level values and the Runvalues in a same AC to DC order as in the succession of pairs of valuesfrom the Run-Level decoder.

Generally, a decoding process of the invention includes variable-lengthdecoding according to a decoding order to supply image transform-basedcoefficients and Run values; and inverse-scanning the coefficients usingthe Run values into a block in substantially the same order as thedecoding order.

Generally, a reduced-memory decoder of the invention includes a decoderoperable to decode a series of data input thereto, the decode dependenton coding information in a plurality of different coding tables, are-usable store space for the plurality of different coding tables fordecoder support, the same store space otherwise re-usable for other usesthan decoder support, a decoding memory space able only to holdsubstantially fewer than the plurality of coding tables, the decodercoupled to access only the decoding memory space instead of there-usable store space for coding table information; and a selectioncircuit operable to supply a selection signal to the store to deliver anapplicable coding table to the decoding memory space.

Generally, an electronic system of the invention includes a modemoperable to receive a transmission, a Run-Level decoder responsive tothe transmission to deliver a succession of pairs of values, each pairincluding a Run value and a Level value; an inverse scanner responsiveto the succession of pairs of values to populate a block with the Levelvalues including one or more AC values and a DC value, the Level valuesspaced apart in the block by runs having lengths represented by the Runvalues, and the inverse-scanner operable to sequentially populate theblock using the Level values and the Run values in a same order as inthe succession of pairs of values from the Run-Level decoder; and adisplay circuit operable in response to the inverse scanner to form animage signal.

Other encoders, decoders, codecs and systems and processes for theiroperation and manufacture are disclosed and claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an image encoder.

FIG. 2 is a diagram of a macroblock for a residual layer of imagetransform-based data and having 8×8 blocks for luma, blue chroma and redchroma.

FIG. 3A is a diagram of a conventional DC-to-AC scan of a block.

FIG. 3B is a line diagram of a zig-zag scanning order in the FIG. 3ADC-to-AC scan of a block.

FIG. 4 is a partially block-tabular, partially flow, diagram of a blockof quantized residual data being scanned into a Run & Level buffer in ascan order opposed to a coding order.

FIG. 5 is a block level timing diagram of scan and encoding operationscorresponding to FIG. 4 and producing one-block delay or latency.

FIG. 6 is a block diagram of an electronic circuit having a ping-pongbuffer to provide enough space for accommodating the operations of FIGS.5 and 4.

FIG. 7A is a diagram of an inventive process of AC-to-DC scan of ablock.

FIG. 7B is a line diagram of an inventive reversed zig-zag scanningorder in the FIG. 7A AC-to-DC scan of a block.

FIG. 7C is a partially block-tabular, partially flow, diagram ofinventive process and structure showing a block of quantized residualdata being scanned directly and unbuffered into a Run & Level encoder ina scan order aligned AC-to-DC with a coding order of the encoder.

FIG. 8A is a signal timing diagram for an inventive process detailingscan and encode control operations for the process and structure ofFIGS. 7A and 7B and obviating block delay or latency.

FIG. 8B is a block diagram of an inventive electronic circuit foroperating according to the inventive process of FIG. 8A.

FIG. 8C is a flow diagram for an inventive process corresponding to theoperations of FIG. 8A for the inventive electronic circuit of FIG. 8B.

FIG. 9 is a partially block-tabular, partially flow, diagram of aninventive process for parallelizing and hiding coding table accesseswith the inventive scan and control operations of FIGS. 7A-8C.

FIG. 10A is a block diagram of a hardware encoder core similar to thatof FIG. 6 with conventional scan and ping-pong buffer and also withlarge memory space in the encoder core for all coding tables used by theencoder.

FIG. 10B. is a block diagram of a decoder analogous to the encoder ofFIG. 10A.

FIG. 11A is a high-level block diagram of an inventive hardware encodercore for high per-clock performance.

FIG. 11B is a high-level block diagram of an inventive decoder core forhigh per-clock performance.

FIG. 12A is a detailed block diagram for an inventive hardware encodercore of FIG. 11A and accommodating the inventive process and structuresof FIGS. 7A-9, and with inventive substantially-reduced memory spacecompared to FIG. 10A.

FIG. 12B is a detailed block diagram for an inventive hardware decodercore of FIG. 11B and accommodating inventive processes and structuresanalogous to those in FIG. 12A, and with inventive substantially-reducedmemory space compared to FIG. 10B.

FIG. 12C is a further detailed block diagram for an inventive hardwareencoder core like that of FIG. 12A.

FIG. 13A is a block diagram of an inventive image encoder system forproducing residual coefficients from a current image frame andinventively scanning and encoding them as shown in the other Figures.

FIG. 13B is a block diagram of an inventive image decoder systeminventively decoding and inverse-scanning residual coefficients as shownin the other Figures and further providing a decoded picture image.

FIG. 14 is a block diagram of circuit blocks of an inventive electronicsystem including the inventive circuits of the other Figures.

Corresponding numerals in different Figures indicate corresponding partsexcept where the context indicates otherwise. A minor variation incapitalization or punctuation for the same thing does not necessarilyindicate a different thing. A suffix .i or .j refers to any of severalnumerically suffixed elements having the same prefix.

DETAILED DESCRIPTION OF EMBODIMENTS

Removal of these hereinabove-described impediments, problems, andexpense items confers a big improvement herein.

One type of embodiment herein scans residual coefficients in the sameorder as entropy coding in such standards, and is implemented into animage and video (IVA) AVS (Chinese standard) high definition (HD) ECD(Entropy Coder and Decoder) core and provides reduced logic size, memorysize and improvement of performance. Various embodiments herein areapplicable to AVS, H.264 and any other imaging/video encode and/ordecode processes to which the embodiments can similarly benefit, andvarious parts of the description using the term AVS herein alsoimplicitly relate to H.264 and any other similar image/video processing.Such type of solution is highly beneficial for less chip real estateexpense in H/W implementation, especially in system which is driven atMacroblock level such as in an IVA-HD embodiment.

In order to solve the problems, some embodiments of structure andprocess or method encode residual coefficients without buffer memoryoutside of encoder core and without pipeline latency. In hardware (H/W),the encoder desirably should execute scanning and entropy codingsimultaneously with pipelined architecture. The encoder establishes theorder of scan to be same as the order of entropy coding within a block510. Notwithstanding that run length in AVS residual encoding is in thedirection of DC-to-AC, the remarkable scanning order of the encoderembodiment of FIGS. 7A, 7B, 7C herein is reversed so that the scan orderand entropy coding order are in the same direction as indicated byAC-to-DC directed arrows of FIG. 7C. Thus, the scanning order of scan520 is in the index-downward (63-to-0, drawing-upward large arrow inFIG. 7A) direction of AC coefficients toward the DC coefficient, suchthat AC→DC. So, compared to the conventional process of FIG. 4, theencoder embodiment of FIGS. 7A, 7B, 7C defines a reversed order ofscanning (from AC to DC) as illustrated in FIGS. 7A and 7B. Because theentropy encoder 530 of the embodiment of FIG. 7C consumes the Run andLevel data in the same AC-to-DC order as the scan block 520 producesthat Run and Level data, little or no pipeline latency results and theRun & Level Buffer 210 of FIG. 4 and 335 of FIG. 6 is beneficiallydispensed with.

In FIG. 7A, the direction of scanning in a block of the residual data isillustrated. In FIG. 7B, a trace of scanning or visualization ofscanning order is illustrated. This remarkable encoder 500 startsscanning from the AC coefficient numbered as 63, and then 62, 61, 60 . .. and finishes scanning at the DC coefficient numbered as 00. Becausethe order of scanning and the entropy coding are made to be the same aseach other in this embodiment, there is no need to store all Levels andRun-length into a buffer memory like that of FIG. 4.

Notice that conventional AVS Run-length has direction of DC→AC, which isthe opposite order to the direction of scanning of the embodiment ofFIGS. 7A, 7B, 7C. So, in this embodiment a register called the Levelregister is prepared or provided, to which register a Level is stored.Storing a Level to this Level register introduces a coefficient-levelpipeline latency, but such coefficient-level pipeline latency isnegligible compared to the FIG. 5 block-level pipeline latency that iseliminated.

In FIG. 7C, the residual encoder embodiment starts scanning coefficientsfrom AC to DC. When the encoder encounters or faces the first Levelcoefficient (in this example, it is the “1” of FIG. 7C in cell 24 ofFIG. 7A) then a Run counter 528 of FIG. 8B, provided for monitoring,checking for, and counting consecutive zero coefficients in the residualdata, is activated and the Level coefficient is registered in the Levelregister 526. Thus, the Run-length is scanned in reverse compared toconventional AVS scan. After that, when the encoder encounters or facesa next Level coefficient, the encoder sends the already-registered Levelcoefficient and the latest Run counter value representing the Run-lengthto the entropy encoder 530, and resets the Run counter 528. Run counter528 is reset when encoder faces level coefficient during scanning fromAC to DC. See also the description of waveforms in FIG. 8A. This processembodiment does not need any buffer memory to store all values of Leveland Run-length in a block as result of scanning. Accordingly, theembodiment of FIG. 7C omits and removes such buffer memory andbeneficially eliminates about 70 clocks of latency per block.

Thus, scanning or checking of the coefficients from AC to DC (e.g.,63-to-zero) to make order of scanning be the same as the order ofentropy coding, solves issues and problems faced hitherto. All theping-pong buffer memory (e.g., 3 Kbit) is or can be removed, conferringa reduction of power and area. Control logic for buffer memory is alsoremoved (e.g., 4 Kgate which may amount to 5% reduction). Pipelinelatency is reduced by, e.g., 70 clocks or over 8% of Macroblock timebudget). Scanning residual coefficient of a block starts fromlowest-right AC coefficient in FIG. 7C and ends with highest-leftcoefficient, which is the DC coefficient.

As illustrated in FIGS. 8A and 8B, such an embodiment of or in an IVA-HDprocessor (FIG. 14) has waveforms of signals and operations toaccomplish AVS residual encoding in a residual encoder embodiment. Theresidual encoder starts reading residual coefficients in response to apulse signal named Scan Start from a control logic 540 in FIGS. 8A and8B and by Read Enable logic 542 thereupon asserting a Read Enable on thenext clock. A Read Counter 522 thus enabled starts counting to vary itscount index value indicated by the Read Counter 522 to scan a residualbuffer 510 holding a block of the residual coefficients from 63 down tozero. After an acceptable access delay of a few clocks (three clocksshown), a Read Data Enable line is asserted and a series ofclock-by-clock Read Data d63, d62, . . . d0 enter a Level Detector. Someencoder embodiments initially run the Run counter in response to theRead Data Enable and before the first Level coefficient is encountered,and then send an initial Run count to the decoder, and an analogousdecoder uses that initial Run count and positions the received firstLevel coefficient at its corresponding inverse-scan position.

The Level Detector 524 determines whether each Read datum is NOT zero,and if a not-zero instance occurs, then a Level enable pulse or signalline is asserted during a clock interval respective to each suchnot-zero instance. (Some not-zero instances are consecutive in FIG. 8A,and the Level enable is correspondingly extended in duration toencompass them.) Each such Level value from the Read data is registeredto the Level register 526 as indicated by Level value (Registered)illustration with d62, d59, d55, d53, d52, . . . d[n], d05, d02, d01,d00. (Notice that this series of Level data represents a data exampledifferent from the example in FIGS. 4 and 7C, merely to simplify FIG. 8Aand elucidate the process.)

When the encoder 500 encounters or faces the first Level coefficient,the Level Detector 524 registers the Level value in Level register 526,asserts Run Counter Enable (0-to-1 after first Level coefficient) andstarts incrementing a Run Counter 528 to count up a Run length value.Run Counter 528 increments on each clock if and so long as eachsuccessive residual coefficient datum is zero during “run counterenable=1” in FIG. 8A, i.e. during the asserted state of the Run CounterEnable.

If and when the encoder faces the second Level coefficient or asubsequent Level coefficient, a signal AC Encode Enable is asserted fromthe Level Detector 524 circuit with duration same as correspondingduration of asserted signal Level Enable, and the registered Level valueof the first coefficient and Run counter value are encoded as residualencoder output, as indicated by a first vertical down arrow entitled“Encode 1st coefficient in the block.” Several such vertical down-arrowsare shown at bottom of FIG. 8A. Also, the second Level coefficient isregistered (e.g. d59) in the Level register 526, and the Run Counter 528in FIG. 8B is reset and starts incrementing. Run counter 528 is resetwhen encoder faces or encounters a Level coefficient during scanningfrom AC to DC.

The process repeats analogously for subsequent Level coefficients. Whenthe DC coefficient d00 is read in as Read Data, the signal Level Enableand the signal Run Counter Enable are both de-asserted and go low, andin the same clock cycle a signal pulse DC Encode Enable is assertedhigh. The last coefficient d00 in the block is encoded, as indicated bythe lower-rightmost down arrow in FIG. 8A, and a completion pulse BlockEncode Done is issued by the residual encoder. In this way, a light andfast residual encoder embodiment with only one register is achieved andis light, for instance, in terms of real estate and power dissipation.Various alternative circuits can be prepared by the skilled workerinformed by the teachings herein to accomplish substantially similaroperations or results.

A process embodiment of FIG. 8C for encoding is set forth in thefollowing flow description of flow steps (1)-(5).

Encoder step 610 starts scanning coefficients from bottom-right of ablock (position=63 in FIG. 7).

According to a decision step 620, when the encoder finds a Levelcoefficient for the first time from (1), the Level coefficient isstored, according to a step 630, in the Level register (in FIG. 8A, d62is the first Level coefficient and is registered), and the Run counter528 is started counting.

According to a decision step 640, when the encoder finds a next Levelcoefficient d59 after (2), the encoder sends Level d62 and the value ofRun counter to entropy encoder. In a step 650, Entropy encoder startsentropy coding of the already-registered Level coefficient d62 and theRun counter value (equal to Run-length). The new Level coefficient d59is then registered in the Level register, and the Run counter is resetto zero.

In a step 660, the encoder repeats steps 640, 650, i.e. action of (3)until scanning position reaches DC coefficient position 00.

According to a decision step 670, when the encoder reaches DC position,an additional one-clock pulse (DC Encode Enable) is asserted accordingto step 680 in order to start entropy coding of the last Levelcoefficient d00 with Run-length zero.

With this process embodiment, pipeline latency is small and equal toclock cycles of Run-length. In processing a residual layer of aMacroblock, each coefficient is processed within one (1) clock, i.e.single-clock per coefficient performance is delivered by this residualencoder embodiment in both of AVS encoder and decoder for residuallayer. And such process embodiment of FIG. 7C needs no space-consumingbuffer memory area of FIG. 6 to store Run-length and Level for a fullone-block latency-wasting time interval of FIG. 5. Memory real estate isor can be removed and thus expense is saved or room for other uses isafforded. Access to such memory is obviated, so control logic for suchmemory is not provided and thus still more real estate economy isrealized. This embodiment reduces cost, i.e. makes a low-costcontribution, and provides a lower-power H/W implementation of aresidual encoder that can execute AVS.

A further beneficial impact eliminates certain space consuming tables.In an AVS residual encoder, an operation called CAVLC (context adaptivevariable length coding) is applied to scanned Levels and Runs. The CAVLCof AVS has 19 tables: 7 tables for luma intra Macroblock, 7 tables forluma inter Macroblock and 5 tables for chroma Macroblock. For example, aVLC table is changed as context data is updated for each Levelcoefficient. The functional flow chart is illustrated in following FIG.9, which shows a procedure of updating a VLC Table on an AVS residualencoder (inter Macroblock) improved as taught herein.

In FIG. 9, a specified maximum coefficient value or defined thresholdcalled max_coef is pre-stored and corresponds to each VLC Table for useby the FIG. 9 process to select which VLC Table to apply. A detailedprocedure is set forth in the following flow statement describing FIG.9, assuming a FIG. 2 Macroblock of the Luma Inter type. In FIG. 9, theprocess selects which of the 7 Inter tables VLC<0-6>_inter to apply forentropy encoding a posited Luma Inter Macroblock.

Encoder starts reading Level/Run in AC-to-DC order starting from ID=13in the Run & Level Buffer of FIGS. 4 and 9 or those same Level/Runvalues first-produced by now-reversed-scan in FIG. 7C. In a flow step710, the value of max_coef is 0 at the beginning of encoding a block.

The encoder in a step 720 refers to VLC0_inter table, and gets a symbolvalue from that VLC table or a Code Value from the VLC coding table 750for the FIG. 12A, 12C VLC Encoder to use in generating a symbol valueaccording to a step 730.

The symbol value is sent by step 740 to exp-golomb encoder (exp meansexponential).

In a decision step 760, the encoder evaluates whether the current Levelvalue is larger than the specified max_coef value for the current VLCtable or not. If not, the same VLC table is applied to next Level.Otherwise, the VLC table 750 selection is changed to a new one based onthe Table definition illustrated at right in FIG. 9.

The encoder (step 720) reads next Level, and then refers to theevaluated VLC table and gets symbol value by step 730.

The encoder repeats (3), (4) and (5) until it reaches to the last Levelin the block.

The encoder asserts an EOB (end of block) code when it reaches the lastLevel in the block.

In FIG. 10A, all 19 VLC tables are implemented into encoder H/W storagein some embodiments. In FIG. 10A, an entropy encoder core for AVSresidual layer has Context Manager, VLC Encoder, 19 VLC tables and 4Exp-Golomb Encoder cores. The scanned results flow into the ContextManager and VLC Encoder. The Context Manager checks the value of thescanned coefficients, and defines which VLC table and which Exp-Golombencoder core 0, 1, 2 or 3 are to be applied. The VLC encoder convertsthe coefficients into symbol values by using the particular VLC tableselected by the Context Manager. The output symbol from the VLC table issent to the Exp-Golomb encoder, and the output of the Exp-Golomb encoderis sent to the stream buffer. In this way, encoding each coefficient isaccomplished.

Conversely on conventional decode in FIG. 10B, the decoder core consistsof 4 types of Exp-Golomb decoder, 19 VLC tables, VLC decoder and ContextManager. Firstly, the Exp-Golomb decoder reads the bitstream and obtainssymbol and consumed bit length. The bit length is sent to stream bufferand defines a pointer of the stream buffer for decoding a next symbol.The obtained symbol is sent to VLC decoder. The VLC decoder decodes thesymbol and obtains Level and Run by applying the VLC table selected bycontext manager. The obtained Level and Run are sent to inverse scan andContext Manager. The Context Manager updates the selection of VLC tableand Exp-Golomb decoder to be applied to next coefficient. In this way,conventional decoding of a coefficient is accomplished.

In FIG. 10A, some embodiments of FIGS. 7A-8C pre-store all 19 tables for2D-CAVLC in an albeit space-consuming internal ROM in an AVS H/W core tomaintain performance in case of unlikely scenarios wherein most or allcoefficients in a block have Level value. This approach is predicated orpremised on avoiding a reduction in performance that could be caused byaccumulation of numerous otherwise-external memory access latencies.Conventionally, given the opposed scanning and coding orders of FIG. 4,such latencies would seem unhideable and cumulative if the matter wereconsidered at all, and such latencies would presumably be unavoidable.Several clock cycles would be spent or expended on each external memoryaccess to read code of the CAVLC if it were not internal to the AVS H/Wcore, given the opposed scanning and coding orders. The circuit of FIG.10A in some embodiments here is provided with the special scan blockhaving reversed-scan, and the one-block latency of FIG. 5 is therebyeliminated in such embodiments.

Here, moreover, some additional embodiments of FIGS. 11A, 12A, 12C and13A can even omit the 19-VLC ROM space in the encoder core H/W and areoperable to boot-store or otherwise prepare all 19 CAVLC tables in anexternal large buffer 870 outside the H/W encoder core 800. Latency ofeach or most every external VLC table access is effectively eliminatedor hidden and mainly or preponderantly killed by parallelizing andhiding the clock cycles of such access latency behind the clock cyclesfor the Run counter in the special scan block of FIG. 8B. This worksbecause, on average usually and statistically, some AC coefficients are0 (zero) and thus contribute to a Run that occupies Run Counter 528cycles. And omitting the ROM table beneficially reduces logic size andpower consumption in the encoder core. By reversing the opposed scanningand coding orders of FIG. 4 to have same scanning and coding orders, theembodiments FIGS. 7C, 8B, 12A and 12C set aside any premise ofsubstantial unhideable accumulation of memory access latencies andconfer remarkable residual encoder embodiments for performance andeconomy.

Additionally, for an example VLC access latency-hiding embodiment, anencoding structure and process or method herein provides a volatilememory space 860 internal to the encoder core instead of the largeinternal ROM VLC space and furthermore reduces the size of the encoderH/W memory space for VLC to hold just one VLC table. From thedescription of FIG. 8A, and FIGS. 8B and 8C, one finds latency of scanand entropy coding for a period between two neighboured Levels (formerone is registered, and then encoded into bitstream when latter one isfound and registered). It means that the encoder 800 can check theformerly registered Level and reads only one VLC table from externalmemory 870 outside of the encoder during a time period for checking nextLevel because latency due to memory access can be hidden by thisinternal latency of the encoder. Discussion of the structure and processfor hiding the memory access latency behind the Run counter 528 isfurther provided in connection with FIG. 12A after describing this lightand fast encoder embodiment at top level in FIG. 11A.

In FIG. 11A, storage for an 8×8 Residual Block is situated at andoccupies the point in the encoder block diagram of FIG. 13A labeledResidual Coefficient. Analogously, in FIG. 11B, the Residual Block issituated at and occupies the point in the encoder of FIG. 13B labeledResidual Coefficient. FIGS. 12A and 12B respectively represent a detailof the blocks of FIGS. 11A and 11B.

In the FIG. 11A encoder embodiment, the Residual Block is scanned outfrom two dimensions to one dimension (2D to 1D) in the special AC-to-DCorder 63 to zero as described earlier hereinabove for FIGS. 7A-8C. Leveland Run-length are output as in waveform FIG. 8A, block diagram FIG. 8Band flow FIG. 8C. The scanned out coefficients and Level and Run-lengthare fed to the Entropy Encoder core as described for FIG. 12A so as toprovide a pipelined flow that reads one coefficient in one clock andencodes at a rate of one symbol per clock to produce a bitstream asoutput from Entropy Encoder block of FIGS. 11A, 12A and 13A to a streambuffer in FIGS. 11A and 12A.

Conversely in a FIG. 11B decoder embodiment, an analogous bitstream froma stream buffer in FIG. 11B is input to an Entropy Decoder Core inEntropy Decode block of FIGS. 11B, 12B and 13B. Entropy Decoder Core 900outputs 1D stream of coefficients along with Level and Run-length to anInverse Scan block 940 that outputs coefficients to fill up the 2DResidual Block 950 with residual coefficients. On decode in thisembodiment, the order of inverse scan is arranged to be same as order ofentropy decoding within a block. In this way the Decoder provides apipelined flow that decodes at a rate of one symbol per clock andinverse-scan-writes one coefficient in one clock to fill up the ResidualBlock in two dimensions 2D in an embodiment for decode that operatesconversely or inversely to an embodiment for encode. A bitstream symbolincludes or represents Level and Run information. Each successive Levelcell is written with its Level value at a Level cell address formed bysumming or combining the last Level cell address and the Run informationto work downward in the block in AC-to-DC order. All cells in the 8×8block storage 950 have been previously initialized with zeroes beforedecoding the block. Intermediate Run cells intermediate the Level cellsremain initialized with zeroes in the 8×8 block storage and are skippedover in the addressing process and automatically constitute thezero-valued Run coefficients by default. Overall, a stream buffer 910supplies a variable-length-coded bit stream to a Golomb decoder 920supported via a mux 936 by Exp-Golomb Decoder cores 937 for exponents 0,1, 2, 3. A VLC Decoder 930 aided by an applicable VLC Table in smallmemory space 960 decodes each symbol from Golomb decoder 920 and outputsLevel & Run value pairs to Inverse Scan 940 and Context Manager 980unbuffered. Context Manager 980 supplies VLC Table select control toexternal large buffer 970 and thereby delivers the applicable VLC codingtable into small memory space 960. Decode of a subsequent Run-Level isperformed on the basis of the next-previous Run-Level information.Context Manager 980 also supplies an exponent select control Exp._Selectto control selection by mux 936.

Returning to encode, an encoder embodiment of FIG. 12A provides areduced memory area 860 inside the encoder core H/W to store VLC tabledata from as little as only one VLC table, which allows or deliversencoder memory size reduction in an embodiment herein for achieving anAVS residual encoder. The data for the 19 VLC tables is instead put in aFIG. 12A cache 870 (e.g., a Level 2 cache L2 or SL2) where the data arereadily accessible and the cache 870 access latency is acceptable andhidden herein and the cache space is re-usable for other applicationswhen the encoder 800 is not executing.

Regarding table selection in FIG. 9 and FIG. 12A, in FIG. 9, the Levelvalue is only applied to select the next VLC table. In FIG. 12A, theLevel value, or both the Level value and Run value are applied to selecta next symbol value in the next VLC table. Two operations are applied inFIG. 12A: 1) selecting the next VLC table from cache 870 and 2) derivinga symbol value from the selected VLC table in small space 860. FIG. 12Asaves memory area. If the Run value is a large value, or even a modestlysignificant value, the embodiment also saves and hides behind clockcycles consumed for counting the number of consecutive zero-coefficientsby checking only one symbol value from VLC table in or obtained from L2cache. In this structure and method, a symbol value applied to Run/Levelis only read from a VLC table in table space 860, a VLC table that isselected from the VLC tables stored in external L2 cache. This way, theencoder does not have to have space for so many VLC tables, indeed tablespace 860 for one VLC table in the encoder hardware is enough herein.

Moreover, reading the 2D CAVLC table is parallelized and hidden in someembodiments. Here, clock cycles for Run-counting the zero-valued ACcoefficients are also consumed for reading the selected one VLC tableout of 19 tables from L2 cache into internal VLC space reduced to holdjust one VLC table. The same clock cycles are used both for Run-countingeach zero-valued coefficient and for reading the selected single VLCtable.

In FIGS. 12A, 12C and 8A, 8B, the encoder scan block 820 consumessuccessive cycles for incrementing Run counter 528 when there is amulti-cycle Run length. Some embodiments beneficially utilize the sameseveral clock cycles that Run counting occupies to read code of theLevel-selected Huffman VLC table from cache memory 870 that holds 19 VLCtables into the residual encoder storage space 860 having VLC storagearea for just one VLC table. Such embodiments herein can accomplish thisbecause the reversed-scan embodiment of FIG. 7C uses the Level Detector524 and Level register 526 of FIGS. 8A and 8B to acquire the Levelvalue, on which the VLC table selection (step (4) in description of FIG.9) depends, before or just as the Run counter 528 of FIG. 8B commencescounting Run-length, not afterwards. Accordingly, these operationsjust-in-time parallelize and hide some or all of the clock cyclesinvolved in accessing the thereby-selected VLC table from cache.

In FIG. 8B, the line AC Encode Enable triggers a selection and read ofdata from one of the 19 VLC tables in L2 cache into the single VLC tablespace of FIG. 12A, 12C in the entropy encoder hardware. Note that the ACEncode Enable line in FIG. 8B is the same line (or coupled withinsignificant delay to) the Enable for Run Counter 528 in FIG. 8B. TheAC Encode Enable line in FIG. 8B goes to blocks as appropriate toactivate the entropy encoder parts of FIG. 12A, 12C and specificallyhere to the Context Manager 880 in FIGS. 12A, 12C that commences the VLCTable access from cache 870 compatibly with FIG. 9. Consequently, theRun counting in the FIGS. 8B and 12A, 12C Scan block 820 and theaccessing of selected VLC table from cache 870 in FIG. 12A and FIG. 12Care concurrent and thus hide some or all of the cache 870 memory accesslatency.

FIG. 12C emphasizes the important parts of the AVS or H.264 encoderembodiment shown therein. The encoder scan block 820 uses a scan orderAC-to-DC, which is a special reversed-scan to match the entropy codingorder that is applied in the AC-to-DC direction of FIG. 7C to Run/Leveldata pipelined in from the scan block. The Table select from ContextManager 880 to external buffer in cache 870 (e.g., L2/SL2) isLevel-based and initiated or actuated by AC Encode Enable in FIGS. 8Band 12C without need of the as-yet-in-progress counting of Run-length.Compare FIG. 12C to FIG. 9 Level-based table selection from FIG. 10AROM. Cache memory 870 access clock cycles are thereby parallelized andhidden in FIG. 12C behind the oncoming Run counter 528 clock cycles inFIGS. 8A, 8B. When the Run count does count up and establish theRun-length, the now-established Run-length is signaled by the circuitryto the VLC candidate table space 860 in the entropy encoder real-estate800 and/or otherwise as appropriate to facilitate the completion of theentropy coding.

Scan 820 scans block 810 and outputs Level/Run value pairs to ContextManager 880 unbuffered on line 824. Likewise, Scan 820 sends theLevel/Run value pairs on line 826 to a symbol encoder including EscapeDecision circuit 839 coupled (non-escape) to VLC Encoder 832. A Mux 838supplies Golomb Encoder 834 with each latest symbol from VLC Encoder 832or from Escape Decision circuit 839 depending whether an escapecondition exists or not. Context Manager 880 supplies VLC Table selectcontrol to external large buffer 870 and thereby delivers the applicableVLC coding table into small memory space 860 that acts as a codingmemory space able only to hold substantially fewer than the plurality ofcoding tables in buffer 870. VLC encoder 832 is coupled to access onlythe coding memory space 860 instead of the re-usable store space 870 forcoding table information. Golomb encoder 834 is supported via a mux 836by Exp-Golomb Encoder cores 837 for exponents 0, 1, 2, 3. VLC Encoder832 aided by the applicable VLC Table in small memory space 860 encodeseach Level/Run value pair from Scan 820 into a symbol for Golomb encoder834. Context Manager 880 also supplies an exponent select controlExp._Select to control selection by mux 836 for Golomb encoder 834.Golomb encoder 834 encodes each symbol from mux 838 and supplies aresulting variable-length-coded bit stream to a stream buffer 840.

In FIG. 12A and FIG. 12C, the table selection from the 19 VLC tables isresponsive to the FIG. 8B Level value (and not Run) input to the EntropyEncoder when AC Encode Enable of FIG. 8B goes active.

As described hereinabove for FIG. 7C, the encoder scanning order is sameas the entropy coding order. The encoder embodiments of FIGS. 7A-13Bstart scanning the residual coefficients in a block from the last(64^(th)) AC position to DC position according to the following flowsteps augmented to hide VLC cache accesses:

(1) Encoder starts scanning coefficients from bottom-right (d63) of ablock in FIGS. 7A and 7C in a scanning order represented by FIG. 7B.(2) When the encoder finds a Level coefficient for the first time from(1), the Level coefficient is stored in the Level register 526 of FIGS.8A and 8B.(3) When the encoder finds a next Level coefficient after (2), theencoder uses the selection procedure of FIG. 9 to select and access oneVLC table from 19 VLC tables in L2 cache 870 and transfer the selectedVLC Table data thus selected into much-reduced single-VLC-table space860 on precious encoder real estate 800. Concurrently, the Run counter528 value counts up and reaches the value that is equivalent toRun-length. The encoder entropy codes based on the already-registeredLevel coefficient and now-available Run counter value and thenow-present selected VLC table. In the meantime, the successively-newLevel coefficient is registered and the Run counter 528 is reset to 0(zero).(4) The encoder repeats action of (3) until scanning position reaches DCcoefficient d00.(5) When the encoder reaches DC position, an additional pulse isasserted in order to initiate entropy coding of the last Levelcoefficient (d00) with Run-length zero.

During scanning of residual coefficients, scan pipeline latency in FIGS.8A and 8B comprehends or spans a number of clock cycles equal toRun-length. So, the encoder selects, transfers and/or prepares one VLCtable in a small space 860 inside the encoder for the one VLC tableselected from cache or other external memory 870 during this Run counterlatency, which means that the read time for reading a VLC table ishidden by this Run counter latency. In this way, encoder processing timeand overhead is removed or reduced.

Benefits and solved problems conferred by some embodiments hereininclude any or all of the following, among others: 1) elimination ofFIG. 6 buffer memory 335 for Level coefficient and Run-length, 2)integrated circuit real estate economy by eliminating that buffer memoryspace and its control circuitry, 3) power savings likewise, 4)elimination of FIG. 5 block-level pipeline latency during encodingresidual layer, since the order of scan and of entropy coding no longeroppose each other, 5) reduction of VLC table space in the encoder, from19 VLC table spaces to one VLC table space 860, and 6) hiding externalmemory accesses to VLC tables behind the encoder scan latency becausereversed scan finds each Level value, on which VLC table selectiondepends, before a Run latency occurs and therefore the Run latency canbe leveraged to hide the external memory accesses. Various embodimentsmake contributions to encoding/decoding HDTV images and other imagetypes in real-time.

In FIG. 13A, an AVS encoder has Motion Estimation ME, MotionCompensation MC, intra prediction, transform T, quantization Q andloop-filter. Since AVS is very similar to H.264, embodiments of encodersand encoding as taught herein can be provided to improve performance andeconomy relative to both AVS and H.264 standards and other standardsthat can similarly benefit. An Entropy encoder block is improvedremarkably as taught herein and placed or situated behind quantization.Here, output data from quantization Q is called a residual coefficient.The entropy encoder block such as in FIGS. 12A and 12C reads theresidual coefficient and some information for syntax of Macroblockheader, and converts them into an output bitstream. During encoding,exp-golomb code and 2D-CAVLC (context adaptive VLC) are applied withsubstantial performance enhancement, latency reduction, and improvedreal-estate and power economies as described herein. Feedback isprovided by blocks for motion compensation MC, Intra Prediction, inversetransform IT, inverse quantization IQ and loop filter.

In FIG. 13A, a current Frame is fed from a Frame buffer to a summingfirst input of an upper summer. The upper summer has a subtractivesecond input that is coupled to the selector of a switch that selectsbetween predictions for Inter and Intra Macroblocks. The upper summersubtracts the applicable prediction from the current Frame to produceResidual Data as its output. The Residual Data is fed to Transform T andthen to Quantization Q. Quantization Q delivers quantized ResidualCoefficients in 8×8 blocks, for instance, as discussed in other Figuresherein for processing by an Entropy Encode block (also called theencoder in the other Figures such as in FIGS. 11A, 12A and 12C whichinclude detailed FIG. 8B reverse-order scanning circuitry and entropyencoder circuitry).

Further in FIG. 13A, the Residual Coefficients are fed back throughinverse quantization IQ and inverse transform IT to supply reconstructedResidual Data to a summing first input of a lower summer. The lowersummer has a summing second input that is coupled to and fed by thealready-mentioned selector of a switch that selects between thepredictions for Inter and Intra Macroblocks. The lower summer adds theapplicable prediction to the reconstructed Residual Data to produce alower summer output. The lower summer output is 1) fed to a Loop Filterand 2) also feeds an Intra Prediction block to provide the switch withthe Intra prediction, and 3) further feeds a first input of a block forIntra Prediction Mode Decision. The current Frame is fed to a secondinput of the block for Intra Prediction Mode Decision, which in turndelivers a mode decision to the Intra Prediction block.

The Loop Filter is coupled at its output to write into and store data ina Decoded Picture Buffer. Data is read from the Decoded Picture Bufferinto two blocks designated ME (Motion Estimation) and MC (MotionCompensation). The current Frame is fed to motion estimation ME at asecond input thereof, and the ME block supplies a motion estimationoutput to a second input of block MC. The block MC outputs motioncompensation data to the Inter input of the already-mentioned switch. Inthis way, the image encoder is implemented in hardware, or executed inhardware and software in the IVA processing block of FIG. 14, andefficiently compresses image Frames and entropy encodes the resultingResidual Coefficients as taught herein.

The AVS decoder of FIG. 13B is a subset of FIG. 13A in that, compared toFIG. 13, FIG. 13B substitutes a box Entropy Decode for Entropy Encode,uses the feedback blocks, and omits the blocks Frame (current) andassociated block Intra Prediction Mode Decision, and further omitsMotion Estimation ME, upper summer, Transform T and Quantization Q.

In the decoder embodiment of FIG. 13B, the entropy decoder blockimproved as in FIG. 12B, for one example, reads the bitstream andconverts it into residual coefficients and some information for syntaxof the Macroblock header such as motion vector and Macroblock type. Anexp-golomb decoder and 2D-CAVLD are applied in the FIG. 12B improvedentropy decoder block. The residual coefficients are inverse quantizedin block IQ, and an inverse of the transform T is applied by block ITproducing residual data. The residual data is applied to a FIG. 13Bsummer (lower summer of FIG. 13A). Summer output is fed to an IntraPrediction block and also via the Loop Filter to a Decoded PictureBuffer. Motion Compensation block MC reads the Decoded Picture Bufferand provides output to the Inter input of a switch for selecting Interor Intra. Intra Prediction block provides output to the Intra input ofthat switch. The selected Inter or Intra output is fed from the switchto a second summing input of the summer. In this way, an image isconstituted by summing the Inter or Intra data plus the Residual Data.

Some embodiments of the residual encoder are implemented in IVA-HDhardware of FIG. 14 or otherwise appropriately to form morecomprehensive system-on-chip embodiments for larger device and systemembodiments, as described next. In FIG. 14, a system embodiment 3500improved as in the other Figures has an MPU subsystem and the IVAsubsystem, and DMA (Direct Memory Access) subsystems 3510.i. The MPUsubsystem suitably has one or more processors with CPUs such as RISC orCISC processors 2610, and having superscalar processor pipeline(s) withL1 and L2 caches. The IVA subsystem has one or more programmable digitalsignal processors (DSPs), such as processors having single cyclemultiply-accumulates for image processing, video processing, and audioprocessing. IVA provides multi-standard (AVS, H.264, H.263, MPEG4, WMV9,RealVideo®) encode/decode at D1 (720×480 pixels), and 720p MPEG4 decode,for some examples. The AVS codec for IVA is improved for high speed andlow real-estate impact as described in the other Figures herein. Alsointegrated are a 2D/3D graphics engine, a Mobile DDR Interface, andnumerous integrated peripherals as selected for a particular systemsolution. The IVA subsystem has L1 and L2 caches, RAM and ROM, andhardware accelerators as desired such as for motion estimation, variablelength codec, and other processing. DMA (direct memory access) performstarget accesses via target firewalls 3522.i and 3512.i of FIG. 14connected on interconnects 2640. A target is a circuit block targeted oraccessed by another circuit block operating as an initiator. In order toperform such accesses the DMA channels in DMA subsystems 3510.i areprogrammed. Each DMA channel specifies the source location of the Datato be transferred from an initiator and the destination location of theData for a target. Some Initiators are MPU 2610, DSP DMA 3510.2, SDMA3510.1, Universal Serial Bus USB HS, virtual processor data read/writeand instruction access, virtual system direct memory access, display3510.4, DSP MMU (memory management unit), camera 3510.3, and a securedebug access port to emulation block EMU.

Data exchange between a peripheral subsystem and a memory subsystem andgeneral system transactions from memory to memory are handled by theSystem SDMA 3510.1. Data exchanges within a DSP subsystem 3510.2 arehandled by the DSP DMA 3518.2. Data exchange to store camera capture ishandled using a Camera DMA 3518.3 in camera subsystem CAM 3510.3. TheCAM subsystem 3510.3 suitably handles one or two camera inputs of eitherserial or parallel data transfer types, and provides image capturehardware image pipeline and preview. Data exchange to refresh a displayis handled in a display subsystem 3510.4 using a DISP (display) DMA3518.4. This subsystem 3510.4, for instance, includes a dual outputthree layer display processor for 1×Graphics and 2×Video, temporaldithering (turning pixels on and off to produce grays or intermediatecolors) and SDTV to QCIF video format and translation between othervideo format pairs. The Display block 3510.4 feeds an LCD (liquidcrystal display), plasma display, DLP™ display panel or DLP™ projectorsystem, using either a serial or parallel interface. Also televisionoutput TV and Amp provide CVBS or S-Video output and other televisionoutput types.

In FIG. 14, a hardware security architecture including SSM 2460propagates Mreqxxx qualifiers on the interconnect 3521 and 3534. The MPU2610 issues bus transactions and sets some qualifiers on Interconnect3521. SSM 2460 also provides one or more MreqSystem qualifiers. The bustransactions propagate through the L4 Interconnect 3534 and line 3538then reach a DMA Access Properties Firewall 3512.1. Transactions arecoupled to a DMA engine 3518.i in each subsystem 3510.i which supplies asubsystem-specific interrupt to the Interrupt Handler 2720. InterruptHandler 2720 is also fed one or more interrupts from Secure StateMachine SSM 2460 that performs security protection functions. InterruptHandler 2720 outputs interrupts for MPU 2610. In FIG. 14, firewallprotection by firewalls 3522.i is provided for various system blocks3520.i, such as GPMC (General Purpose Memory Controller) to Flash memory3520.1, ROM 3520.2, on-chip RAM 3520.3, Video Codec 3520.4, WCDMA/HSDPA3520.6, device-to-device SAD2D 3520.7 to Modem chip 1100, and a DSP3520.8 and DSP DMA 3528.8. In some system embodiments, Video Codec3520.4 has codec embodiments as shown in the other Figures herein. ASystem Memory Interface SMS with SMS Firewall 3555 is coupled to SDRC3552.1 (External Memory Interface EMIF with SDRAM Refresh Controller)and to system SDRAM 3550 (Synchronous Dynamic Random Access Memory).

In FIG. 14, interconnect 3534 is also coupled to Control Module 2765 andcryptographic accelerators block 3540 and PRCM 3570. Power, Reset andClock Manager PCRM 3570 is coupled via L4 interconnect 3534 to Power ICcircuitry in chip 1200 of FIGS. 1-3, which supplies controllable supplyvoltages VDD1, VDD2, etc. PRCM 3570 is coupled to L4 Interconnect 3534and coupled to Control Module 2765. PRCM 3570 is coupled to a DMAFirewall 3512.1 to receive a Security Violation signal, if a securityviolation occurs, and to respond with a Cold or Warm Reset output. AlsoPRCM 3570 is coupled to the SSM 2460.

In FIG. 14, some embodiments have symmetric multiprocessing (SMP)core(s) such as RISC processor cores in the MPU subsystem. One of thecores is called the SMP core. A hardware (HW) supported securehypervisor runs at least on the SMP core. Linux SMP HLOS (high-leveloperating system) is symmetric across all cores and is chosen as themaster HLOS in some embodiments.

The system embodiments of and for FIG. 14 are provided in acommunications system and implemented as various embodiments in any one,some or all of cellular mobile telephone and data handsets, a cellular(telephony and data) base station, a WLAN AP (wireless local areanetwork access point, IEEE 802.11 or otherwise), a Voice over WLANGateway with user video/voice over packet telephone, and a video/voiceenabled personal computer (PC) with another user video/voice over packettelephone, that communicate with each other. A camera CAM provides videopickup for a cell phone or other device to send over the interne toanother cell phone, personal digital assistant/personal entertainmentunit, gateway and/or set top box STB with television TV. Video storageand other storage, such as hard drive, flash drive, high density memory,and/or compact disk (CD) is provided for digital video recording (DVR)embodiments such as for delayed reproduction, transcoding, andretransmission of video to other handsets and other destinations.

In FIG. 14, a Modem integrated circuit (IC) 1100 supports and provideswireless interfaces for any one or more of GSM, GPRS, EDGE, UMTS, andOFDMA/MIMO embodiments. Codecs for any or all of CDMA (Code DivisionMultiple Access), CDMA2000, and/or WCDMA (wideband CDMA or UMTS)wireless are provided, suitably with HSDPA/HSUPA (High Speed DownlinkPacket Access, High Speed Uplink Packet Access) (or 1×EV-DV, 1×EV-DO or3×EV-DV) data feature via an analog baseband chip and RF GSM/CDMA chipto a wireless antenna. Replication of blocks and antennas is provided ina cost-efficient manner to support MIMO OFDMA of some embodiments. Anaudio block in an Analog/Power IC 1200 has audio I/O (input/output)circuits to a speaker, a microphone, and/or headphones as illustrated inFIG. 14. A touch screen interface is coupled to a touch screen XYoff-chip in some embodiments for display and control. A battery providespower to mobile embodiments of the system and battery data on suitablyprovided lines from the battery pack.

DLP™ display technology from Texas Instruments Incorporated is coupledto one or more imaging/video interfaces. A transparent organicsemiconductor display is provided on one or more windows of a vehicleand wirelessly or wireline-coupled to the video feed. WLAN and/or WiMaxintegrated circuit MAC (media access controller), PHY (physical layer)and AFE (analog front end) support streaming video over WLAN. A MIMO UWB(ultra wideband) MAC/PHY supports OFDM in 3-10 GHz UWB bands forcommunications in some embodiments. A digital video integrated circuitprovides television antenna tuning, antenna selection, filtering, RFinput stage for recovering video/audio and controls from a DVB station.

Various embodiments are used with one or more microprocessors, eachmicroprocessor having a pipeline, and selected from the group consistingof 1) reduced instruction set computing (RISC), 2) digital signalprocessing (DSP), 3) complex instruction set computing (CISC), 4)superscalar, 5) skewed pipelines, 6) in-order, 7) out-of-order, 8) verylong instruction word (VLIW), 9) single instruction multiple data(SIMD), 10) multiple instruction multiple data (MIMD), 11) multiple-coreusing any one or more of the foregoing, and 12) microcontrollerpipelines, control peripherals, and other micro-control blocks using anyone or more of the foregoing.

Various embodiments as described herein are manufactured in a processthat prepares RTL (register transfer language) and netlist for aparticular design including circuits of the Figures herein in one ormore integrated circuits or a system. The design of the encoder anddecoder and other hardware is verified in simulation electronically onthe RTL and netlist. Verification checks contents and timing ofregisters, operation of hardware circuits under various configurations,AVS and H.264 latency and memory access latency-hiding, real-time andnon-real-time operations and interrupts, responsiveness to transitionsthrough modes, sleep/wakeup, and various attack scenarios. Whensatisfactory, the verified design dataset and pattern generation datasetgo to fabrication in a wafer fab and packaging/assembly produces aresulting integrated circuit and tests it with real time video. Testingverifies operations directly on first-silicon and production samplessuch as by using scan chain methodology on registers and other circuitryuntil satisfactory chips are obtained. A particular design and printedwiring board (PWB) of the system unit, has a video codec applicationsprocessor coupled to a modem, together with one or more peripheralscoupled to the processor and a user interface coupled to the processor.A storage, such as SDRAM and Flash memory is coupled to the system andhas VLC tables, configuration and parameters and a real-time operatingsystem RTOS, image codec-related software Public HLOS, protectedapplications (PPAs and PAs), and other supervisory software. Systemtesting tests operations of the integrated circuit(s) and system inactual application for efficiency and satisfactory operation of fixed ormobile video display for continuity of content, phone, e-mails/dataservice, web browsing, voice over packet, content player for continuityof content, camera/imaging, audio/video synchronization, and other suchoperation that is apparent to the human user and can be evaluated bysystem use. Also, various attack scenarios are applied. If furtherincreased efficiency is called for, parameter(s) are reconfigured forfurther testing. Adjusted parameter(s) are loaded into the Flash memoryor otherwise, components are assembled on PWB to produce resultingsystem units.

Aspects See Notes Paragraph at End of this Aspects Section

11A. The encoder process claimed in claim 11 wherein the scanning andthe variable-length encoding together have single-clock per coefficientperformance.

11B. The encoder process claimed in claim 11 wherein the coefficientsinclude residual coefficients.

14A. The encoder process claimed in claim 14 wherein the scanningproduces at least one Level datum representing a non-zero imagetransform-based coefficient and at least one Run datum representing acount of consecutive zero-filled entries subsequent to the Level datumin the scanning order.

14B. The encoder process claimed in claim 14 further comprising, uponthe scanning encountering a Level datum, then registering the Leveldatum and counting consecutive zero coefficients to generate a Rundatum.

25A. The decode circuit claimed in claim 25 wherein said Run-Leveldecoder includes a symbol decoder to produce the succession of pairs ofvalues, and a data stream decoder having an input for a data stream andan output for a succession of symbols coupled to said symbol decoder.

25A1. The decode circuit claimed in claim 25A wherein the succession ofpairs of values arriving from said symbol decoder has its AC to DC ordercorresponding to an order of succession of symbols from said data streamdecoder.

25A2. The decode circuit claimed in claim 25A wherein said data streamdecoder includes a set of exponential decoder cores.

25A2A. The decode circuit claimed in claim 25A2 further comprising aselection circuit operable in response to said symbol decoder togenerate an exponent selection signal to select one of the exponentialdecoder cores for said data stream decoder to use to output a nextsymbol.

25A2A1. The decode circuit claimed in claim 25A2A wherein said datastream decoder, said selection circuit, said symbol decoder, and saidinverse-scanner form a decoder core that also includes an internal VLCtable memory space coupled to said symbol decoder and also to an inputfor receiving a selected single VLC table from elsewhere external tosaid decoder core.

25A3. The decode circuit claimed in claim 25A further comprising astream buffer, wherein said data stream decoder is operable to read adata stream from said stream buffer and obtain the succession ofsymbols.

25A3A. The decode circuit claimed in claim 25A3 wherein said data streamdecoder is further operable to decode a consumed bit length value, andto send the consumed bit length value to said stream buffer, and saidstream buffer is responsive to the consumed bit length value from saiddata stream decoder to define a pointer of the stream buffer fordecoding the next symbol.

25A4. The decode circuit claimed in claim 25A wherein said data streamdecoder includes a bit stream decoder.

25B. The decode circuit claimed in claim 25 wherein said inverse scanneris operable to thereby populate the block with residual coefficients.

25C. The decode circuit claimed in claim 25 further comprising a summingcircuit, an image inverse-transform-based circuit fed by said inversescanner and supplying a first input of said summing circuit, and animage prediction circuit coupled to another input of said summingcircuit, whereby a decoded image is obtainable from said summingcircuit.

29A. The decode circuit claimed in claim 29 wherein said selectioncircuit is responsive to at least a current one of the Level values tosupply a selection signal to select VLC table information to feed intosaid Run-Level decoder.

29B. The decode circuit claimed in claim 29 wherein said selectioncircuit is previously initialized to supply an initial selection signalto select initial VLC table information to feed into said Run-Leveldecoder.

31A. The decoding process claimed in claim 31 wherein all cells in ablock of residual block storage are previously initialized with zeroesbefore inverse-scanning the block.

31A1. The decoding process claimed in claim 31A wherein run cellsbetween coefficient cells remain initialized with zeroes in the blockand are skipped over, whereby automatically constituting zero-valuedentries.

32A. The decoding process claimed in claim 32 wherein the decodingperformance is one coefficient per clock.

38A. The electronic system claimed in claim 38 further comprising acamera circuit to deliver an image signal, an an image encoder operableto encode the image signal, said modem operable to send a secondtransmission responsive to said encoder.

Notes: Aspects are paragraphs which might be offered as claims in patentprosecution. The above dependently-written Aspects have leading digitsand internal dependency designations to indicate the claims or aspectsto which they pertain. Aspects having no internal dependencydesignations have leading digits and alphanumerics to indicate theposition in the ordering of claims at which they might be situated ifoffered as claims in prosecution.

Processing circuitry comprehends digital, analog and mixed signal(digital/analog) integrated circuits, ASIC circuits, PALs, PLAs,decoders, memories, and programmable and nonprogrammable processors,microcontrollers and other circuitry. Internal and external couplingsand connections can be ohmic, capacitive, inductive, photonic, anddirect or indirect via intervening circuits or otherwise as desirable.Process diagrams herein are representative of flow diagrams foroperations of any embodiments whether of hardware, software, orfirmware, and processes of manufacture thereof. Flow diagrams and blockdiagrams are each interpretable as representing structure and/orprocess. While this invention has been described with reference toillustrative embodiments, this description is not to be construed in alimiting sense. Various modifications and combinations of theillustrative embodiments, as well as other embodiments of the inventionmay be made. The terms “including”, “includes”, “having”, “has”, “with”,or variants thereof are used in the detailed description and/or theclaims to denote non-exhaustive inclusion in a manner similar to theterm “comprising”. The appended claims and their equivalents cover anysuch embodiments, modifications, and embodiments as fall within thescope of the invention.

What is claimed is:
 1. A decode circuit comprising: a Run-Level decoderoperable to deliver a succession of pairs of values, each pair includinga Run value and a Level value; and an inverse scanner responsive to thesuccession of pairs of values to populate a block with the Level valuesincluding one or more AC values and a DC value, the Level values spacedapart in the block by runs having lengths represented by the Run values,and the inverse-scanner is operable to sequentially populate the blockusing the Level values and the Run values in a same AC to DC order as inthe succession of pairs of values from the Run-Level decoder.
 2. Thedecode circuit of claim 1 in which the Run-Level decoder includes amemory space for only a single coding table and including a symboldecoder to supply the succession of pairs of values, the symbol decodercoupled to directly access only the memory space for the coding tableinformation to reduce coding table memory space.
 3. The decode circuitof claim 1 including a store for a plurality of different coding tables,and a selection circuit operable to supply a selection signal to thestore to deliver a single applicable coding table at a time to thememory space.
 4. The decode circuit of claim 23 in which the store is are-usable store selected from the group consisting of 1) cache, and 2)random access memory (RAM), whereby the memory space for the singlecoding table saves overall space for the decode circuit.
 5. The decodecircuit of claim 1 including a selection circuit responsive to theRun-Level decoder to select information to feed into the Run-Leveldecoder.
 6. An electronic system comprising: a modem operable to receivea transmission; a Run-Level decoder responsive to the transmission todeliver a succession of pairs of values, each pair including a Run valueand a Level value; an inverse scanner responsive to the succession ofpairs of values to populate a block with the Level values including oneor more AC values and a DC value, the Level values spaced apart in theblock by runs having lengths represented by the Run values, and theinverse-scanner operable to sequentially populate the block using theLevel values and the Run values in a same order as in the succession ofpairs of values from the Run-Level decoder; and a display circuitoperable in response to the inverse scanner to form an image signal. 7.The electronic system of claim 6 adapted to form a device selected fromthe group consisting of 1) cellular telephone, 2) projector, 3) mobileentertainment device, 4) personal computer.